1  Introduction

1.1 R Packages

Before starting, you will need the following R packages.

The following is a list of packages that are used throughout this book that need to be loaded before any analysis. A complete list of all the packages used in the book can be found in Chapter 5.

Code
# General Packages:
library(tidyverse)
library(tools)
library(parallel)
library(boot) 
library(table1)
# Packages for Plotting:
library(ggplot2)
library(cowplot) 
library(ComplexHeatmap) 
library(ggh4x)
# Packages for High Dimensional Mediation:
library(HIMA)
library(xtune)
library(RMediation)
library(glmnet)
# Packages for Mediation with Latent Factors:
library(r.jive)
# Packages for Quasi-mediation:
library(LUCIDus)
library(mclust)
library(networkD3)
library(plotly)
library(htmlwidgets)
library(glasso)
library(nnet)
library(progress)
library(jsonlite)

In order to replicate the style of the figures in this book, you will also have to set the ggplot theme:

Code
ggplot2::theme_set(cowplot::theme_cowplot())

1.2 Custom Functions

The analyses in this book rely on several custom functions. The code for functions are provided in Chapter 6.

1.3 The Data

The data used in this project is based off of simulated data from the Human Early Life Exposome (HELIX) cohort (Vrijheid et al. 2014). The data was simulated for one exposure, five omics layers, and one continuous outcome (after publication, this data will be available on github). The format of this data is a named list with 6 elements. It includes separate numeric matrices for each of the 5 omics layers, as well as the exposure and phenotype data. In all datasets in the list, the rows represent individuals and the columns represent omics features. In this analysis, the exposure and outcome are:

  • Exposure: hs_hg_m_resid, representing maternal mercury levels
  • Outcome: ck18_scaled, representing child liver enzyme levels, a major risk factor for non-alcoholic fatty liver disease (NAFLD).

You can read this data into R using the following code:

Code
# Load simulated data
simulated_data <- read_rds(fs::path(dir_data_hg, "simulated_HELIX_data_2.RDS")) 

# Define exposure and outcome name
covars <- c("e3_sex_None", "hs_child_age_yrs_None")

# Extract exposure and outcome data
# outcomes <- simulated_data[["phenotype"]]
exposure <- simulated_data[["phenotype"]]$hs_hg_m_scaled
outcome  <- simulated_data[["phenotype"]]$ck18_scaled

# Get numeric matrix of covariates 
covs <- simulated_data[["phenotype"]][covars] 
covs$e3_sex_None <- if_else(covs$e3_sex_None == "male", 1, 0)


# create list of omics data 
omics_lst <- simulated_data[-which(names(simulated_data) == "phenotype")]

# Create data frame of omics data
omics_df <- omics_lst %>% 
  purrr::map(~as_tibble(.x, rownames = "name")) %>%
  purrr::reduce(left_join, by = "name") %>%
  column_to_rownames("name")

1.3.1 Descriptive Statistics

Table 1.1 shows the summary statistics for the exposure and phenotype data in this analysis.

Code
table1::table1(~., data = simulated_data[["phenotype"]][,-1])
Table 1.1: Descriptive Statistics for the Simulated Variables
Overall
(N=420)
hs_child_age_yrs_None
Mean (SD) 7.22 (1.04)
Median [Min, Max] 7.18 [3.93, 10.9]
hs_hg_m_scaled
Mean (SD) -0.0148 (1.02)
Median [Min, Max] -0.0433 [-2.73, 3.34]
ck18_scaled
Mean (SD) -0.0511 (1.03)
Median [Min, Max] 0.00883 [-3.56, 3.09]
e3_sex_None
female 192 (45.7%)
male 228 (54.3%)
h_fish_preg_Ter
1 233 (55.5%)
2 88 (21.0%)
3 99 (23.6%)

1.3.2 Mercury exposure and childhood MAFLD risk

Code
lm_res <- lm(ck18_scaled ~ hs_hg_m_scaled + 
     e3_sex_None +
     hs_child_age_yrs_None,
   data = simulated_data[["phenotype"]])

summary(lm_res)

In the simulated data, each 1 standard deviation increase in maternal mercury was associated with a 0.11 standard deviation increase in CK18 enzymes (Figure 1.1; p=0.02), after adjusting for child age and child sex.

Code
ggplot(data = simulated_data[["phenotype"]], 
       aes(x = hs_hg_m_scaled, y = ck18_scaled)) + 
  geom_point() +
  stat_smooth(method = "lm",
              formula = y ~ x ,
              geom = "smooth") +
  xlab("Maternal Mercury Exposure (Scaled)") +
  ylab("CK-18 Levels (Scaled)")

Figure 1.1: Association between maternal mercury and CK18 in the Simulated Data

1.3.3 Correlation of omics features

Figure 1.2 shows the correlation within and between the omics layers in the simulated data.

Code
# Change omics list elements to dataframes
omics_df <- purrr::map(omics_lst, ~as_tibble(.x, rownames = "name")) %>%
  purrr::reduce(left_join, by = "name") %>%
  column_to_rownames("name")

meta_df <- imap_dfr(purrr::map(omics_lst, ~as_tibble(.x)),
                    ~tibble(omic_layer = .y, ftr_name = names(.x)))

# Correlation Matrix
cormat <- cor(omics_df, method = "pearson")

# Annotations
annotation <- data.frame(
  ftr_name = colnames(cormat),
  index = 1:ncol(cormat)) %>%
  left_join(meta_df, by = "ftr_name") %>%
  mutate(omic_layer = toTitleCase(omic_layer))

# Make Plot
Heatmap(cormat, 
        row_split = annotation$omic_layer,
        column_split = annotation$omic_layer,
        show_row_names = FALSE,
        show_column_names = FALSE, 
        column_title_gp = gpar(fontsize = 12),
        row_title_gp = gpar(fontsize = 12),
        heatmap_legend_param = list(title = "Correlation"))

Figure 1.2: Heatmap illustrating the correlation of molecular features within and between different omics layers.